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We report the 5.101-Mbp high-quality draft assembly of the Escherichia coli strain ATCC 23506 (serovar O10:K5:H4, also known 
as NCDC Bi 8337-41) genome. This uropathogenic strain, commonly referred to as E. coli K5, produces N-acetyl heparosan, a 
glycosaminoglycan-like capsular polysaccharide and precursor to the anticoagulant pharmaceutical heparin. Metabolic recon- 
struction of this genome will enable the prediction of gene deletions and overexpressions that lead to increased heparosan pro- 
duction. 
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Escherichia coli is the most well-characterized organism com- 
monly utilized in metabolic engineering research. As meta- 
bolic engineers seek to further increase the innate production 
capacities of unique E. coli strains, it is valuable to develop 
genome-scale reconstructions (GSR) of metabolism in order to 
accurately predict genetic engineering interventions that lead to an 
improved phenotype. The K5 capsule is composed of N-acetyl hepa- 
rosan, a group II capsular polysaccharide (CPS) consisting of a re- 
peating [— >4) /3-D-glucuronic acid (GlcA) (1— *4) N-acetyl-a-D- 
glucosamine (GlcNAc) (1— >]„ disaccharide unit (1). Although the 
gene cluster encoding the enzymes required for the biosynthesis of K5 
CPS has been characterized elsewhere, annotation of the whole ge- 
nome sequence will lend further insight into the molecular mecha- 
nisms of capsular polysaccharide biosynthesis and transport. The 
characterization of all genes involved in lipopolysaccharide (LPS) 
biosynthesis will also enhance understanding of CPS-LPS interac- 
tions, while comparative genomic studies between this uropatho- 
genic E. coli (UPEC) strain and nonpathogenic strains might identify 
the virulence factors required for infection of the urinary tract. 

Genomic DNA was purified from E. coli strain ATCC 23506 with 
an Invitrogen PureLink Genomic DNA mini kit. The genome was 
sequenced using the Illumina HiSeq 2000 sequencing system, which 
produced 104 M paired-end reads of 101 bp, with an insert size of 
400 bp. Approximately 28M random reads were assembled with Vel- 
vet vl.2.07 (2) at an optimal hash length of 93. The final genome 
assembly has approximately 38-fold coverage and contains 190 su- 
percontigs composed of 224 contigs (>200 bp in length) with a total 
size of 5, 10 1,025 bp, an N 50 contig length of 129,677 nucleotides, and 
a mean G+C content of 50.6%. Assembly data were deposited in 
the EMBL nucleotide sequence database. 

The draft genome was annotated by the Rapid Annotations 
using Subsystems Technology (RAST) server (3) using Glimmer3 
as a gene caller (4), which predicted 5,030 coding sequences 
(CDSs) with an average length of 880 bp (3,815 CDSs have func- 



tional predictions), 86 tRNA-encoding genes, and 25 rRNA- 
encoding genes. RAST was also used to construct a draft metabolic 
model (5) containing 1,156 genes, corresponding to 1,408 reac- 
tions with 1,112 metabolites (including 4 gap-filling reactions and 
an artificial biomass reaction). 

Of particular interest, the sigma factor rpoF (gene fliA) — re- 
quired for upregulation of the flagellar regulon — was absent from 
the genome, along with several other flagellar biosynthetic genes; a 
motility assay confirmed that uropathogenic E. coli strain ATCC 
23506 is nonmotile in soft tryptone agar (data not shown), a result 
consistent with those of a previous investigation of an E. coli fliA 
deletion mutant (6). A detailed comparative genomics study is 
under way between this strain and other recently sequenced 
strains that also produce glycosaminoglycan-like capsular poly- 
saccharides of pharmaceutical and nutraceutical relevance. Such 
analyses will improve the understanding of CPS biosynthesis reg- 
ulation and the effect of the metabolic landscape on CPS produc- 
tion in pathogenic strains that depend upon the capsule as a "mo- 
lecular camouflage" for host colonization. 

Nucleotide sequence accession numbers. The annotated draft 
genome sequence was deposited in DDBJ/EMBL/GenBank under 
the accession no. CAPK00000000. The version described in this 
paper is the first version, CAPK01000000. 
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